Functional Pronunciation Dictionaries

نویسنده

  • Markus Forsberg
چکیده

This document describes the system Functional Pronunciation Dictionaries (FPD), a language-independent system for defining pronunciation lexicons. The starting point of this system is Functional Morphology (FM) [3, 2], where we asked us the question — given that we already have defined a lexical resource in FM, what would be an efficient approach to extending the lexicon with high-quality pronunciation information? We will use Swedish to illustrate the ideas of FPD. The main idea of FPD is based on the assumption that we can get high correctness by using automatic transcription, and if a transcription is erroneous, it is normally not completely wrong — it may be one or two sounds that are incorrect. Given that this assumption is true, then automatic transcription is a reasonable approach as the first step in the creation of a high-quality pronunciation dictionary. The lexicographer’s job would be to adjust the incorrect transcriptions. M. Uneson [8] reports a precision of 95.7% in his automatic transcription system of Swedish, i.e. approximately every twentieth word is transcribed wrongly. Uneson gives a list of the most problematic cases: compounds; lexical stress; lexical word accent; loan words; pronunciation that is not derivable by orthography; occasional exceptions to normal orthographical markings. Moreover, he reports that the main problem is compound resolution (the compound boundaries are unmarked in Swedish). The compound resolution is, of course, connected to the problem of assigning the lexical stress.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient compression method for pronunciation dictionaries

Pronunciation dictionaries are often used with other datadriven methods to model the pronunciations in phonemebased automatic speech recognition (ASR) and text-to-speech (TTS) systems. The dictionaries usually take a great amount of memory, which is a limiting factor in portable handheld devices. Compressing the pronunciation dictionaries results in minimal transmission bandwidth and less stora...

متن کامل

The efficient generation of pronunciation dictionaries: human factors during bootstrapping

Bootstrapping techniques have significant potential for the efficient generation of linguistic resources such as electronic pronunciation dictionaries. We describe a system and an approach to bootstrapping for the development of such dictionaries, and report on experiments conducted to investigate the efficiency and effectiveness of the system, focusing on the human factors that influence the p...

متن کامل

Automatic Learning and Optimization of Pronunciation Dictionaries

Pronunciation dictionaries are the interface between orthographic and phonetic representation of the speech signal and are thereby a substantial component of speech recognition systems. In many systems simple canonical pronunciation forms are used within the dictionary. They represent the “correct” pronunciation as they are found in lexicons and neither contain the most frequent pronunciation n...

متن کامل

Learning Pronunciation Rules for English Graphemes Using the Version Space Algorithm

We describe a technique for learning pronunciation rules based on the Version Space algorithm. In particular, we describe how to learn pronunciation rules for a representative subset of the English graphemes. We present a learning procedure called LEP-G.1 (learning to pronounce English graphemes) that learns English pronunciation rules from examples in the form of word-pronunciation pairs. With...

متن کامل

Automatic Generation of Pronunciation Dictionaries

In this report we will describe a data driven approach for creating pronunciation dictionaries for a new unseen target language by voting among phoneme recognizers in nine different languages other than the target language. In this process recordings of the new language that are transcribed on word level are decoded by the phoneme recognizers. This results in a hypothesis of nine phonemes per t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007